Skip to content

feat: Add Ollama and local model support#26

Open
FrenzyVJN wants to merge 1 commit intohuggingface:mainfrom
FrenzyVJN:main
Open

feat: Add Ollama and local model support#26
FrenzyVJN wants to merge 1 commit intohuggingface:mainfrom
FrenzyVJN:main

Conversation

@FrenzyVJN
Copy link

Add Ollama and Local Model Support

Summary

This PR adds comprehensive support for using Ollama and other local LLM providers (like LM Studio) with upskill's generate and eval commands. Users can now run skill generation and evaluation entirely locally without requiring API keys from cloud providers.

Motivation

Currently, upskill requires API keys for cloud providers (Anthropic, OpenAI, etc.) to function. This PR enables:

  • Privacy: Run entirely locally without sending data to external APIs
  • Cost savings: Use free local models instead of paid API calls
  • Offline usage: Work without internet connectivity
  • Model flexibility: Test with any model supported by Ollama (llama3.2, qwen2.5-coder, etc.)

Changes

New Command-Line Flags

Both generate and eval commands now support:

  • --base-url <url>: Custom API endpoint for local models (e.g., http://localhost:11434/v1)
  • --provider <name>: API provider (optional, auto-detected as generic when --base-url is provided)

Implementation Details

  1. Model Factory Monkey Patch

    • Patches FastAgent's ModelFactory.parse_model_string() to handle unknown model names
    • Falls back to generic provider when GENERIC_BASE_URL environment variable is set
    • Catches ModelConfigError and returns ModelConfig(provider=Provider.GENERIC, model_name=model_string)
  2. Environment Variable Configuration

    • Sets GENERIC_BASE_URL to point to local API endpoint
    • Sets GENERIC_API_KEY to "local" (required but unused by Ollama)
    • Sets ANTHROPIC_API_KEY to "dummy" to bypass startup checks
  3. Model String Formatting

    • For non-generic providers: prepends provider prefix (e.g., anthropic.claude-3-5-sonnet)
    • For generic provider: passes model name as-is (e.g., llama3.2:latest)
    • Monkey patch handles unknown models automatically
  4. Small Model Improvements

    • Updated generation prompt to be more explicit and direct
    • Added code fence stripping for models that wrap output in markdown blocks
    • This helps smaller models generate valid SKILL.md format
  5. Skills Directory Support

    • FastAgent now loads skills from ./skills/ directory if it exists
    • Added {{agentSkills}} placeholder to skill_gen agent card

Usage Examples

Generate Skill with Ollama

# Generate skill without evaluation
upskill generate "parse YAML files" \
  --model llama3.2:latest \
  --base-url http://localhost:11434/v1 \
  --no-eval \
  -o ./my-skill

# Generate with explicit provider
upskill generate "document code" \
  --model qwen2.5-coder:7b \
  --provider generic \
  --base-url http://localhost:11434/v1 \
  --no-eval

Evaluate Skill with Local Model

# Evaluate with manual test cases
upskill eval ./skills/my-skill \
  --model qwen2.5-coder:7b \
  --base-url http://localhost:11434/v1 \
  --tests tests.json

# Verbose output
upskill eval ./skills/my-skill \
  --model llama3.2:latest \
  --base-url http://localhost:11434/v1 \
  --tests tests.json \
  -v

Testing

Tested with:

  • llama3.2:latest - Basic generation and evaluation works
  • qwen2.5-coder:7b - Better results, 25-75% success rates on simple tasks
  • ✅ Skills loading from ./skills/ directory
  • ✅ Code fence stripping for wrapped outputs
  • ✅ Generate command with --no-eval
  • ✅ Eval command with manual test cases (--tests)

Example Test Results

Hello World Skill with qwen2.5-coder:7b:

  • Baseline: 50% success (2/4 tests)
  • With Skill: 75% success (3/4 tests)
  • Improvement: +25%
  • Recommendation: Keep skill ✅

Known Limitations

  1. Test Case Generation: The --eval-model flag with automatic test generation may not work due to FastAgent's structured() method not properly respecting model overrides.

    Workaround: Use --no-eval flag during generation, then create test cases manually and run eval separately:

    # Generate without eval
    upskill generate "task" --model llama3.2 --base-url http://localhost:11434/v1 --no-eval
    
    # Create tests.json manually
    
    # Evaluate with manual tests
    upskill eval ./skill --model llama3.2 --base-url http://localhost:11434/v1 --tests tests.json
  2. Small Model Quality: Models like llama3.2 (3B parameters) may produce lower quality skills compared to larger cloud models. Recommended to use 7B+ models like qwen2.5-coder:7b for better results.

Backwards Compatibility

All changes are fully backwards compatible:

  • Existing commands work exactly as before
  • New flags are optional
  • Default behavior unchanged
  • No breaking changes to API or configuration

Files Changed

  • src/upskill/cli.py (+148 lines): Added Ollama support, new flags, environment setup
  • src/upskill/generate.py (+36 lines): Code fence stripping, improved prompts, RequestParams support
  • src/upskill/agent_cards/skill_gen.md (+2 lines): Added {{agentSkills}} placeholder

Future Improvements

Potential follow-ups (not in this PR):

  • Document recommended model sizes for different tasks
  • Add model capability detection
  • Improve test case generation for local models
  • Add progress indicators for long-running generations
  • Support for other local providers (Llama.cpp, vLLM, etc.)

Checklist

  • Code follows project style guidelines
  • All changes are backwards compatible
  • Tested with multiple local models (llama3.2, qwen2.5-coder)
  • Documentation included in commit message
  • No breaking changes
  • Error handling for missing Ollama server
  • Works with both generate and eval commands

…luation

This commit adds comprehensive support for using Ollama and other local LLM providers (like LM Studio) with upskill's generate and eval commands.

## Changes

### Core Features
- Add --base-url and --provider flags to both generate and eval commands
- Monkey patch FastAgent's ModelFactory to handle unknown model names when GENERIC_BASE_URL is set
- Auto-detect 'generic' provider when --base-url is provided
- Set dummy API keys to bypass authentication checks when using local models

### Generation Improvements
- Update prompt to be more explicit for smaller models
- Add code fence stripping for models that wrap output in markdown blocks
- Pass model parameter through RequestParams to all FastAgent calls
- Support model override for all generation functions (generate_skill, generate_tests, improve_skill, refine_skill)

### Evaluation Improvements
- Add environment variable configuration for eval command
- Format model strings correctly for generic provider
- Support loading skills from ./skills/ directory

### Bug Fixes
- Fix classmethod monkey patch to properly access __func__
- Fix model formatting logic for eval_model parameter
- Add {{agentSkills}} placeholder to skill_gen agent card to enable skill loading

## Usage Examples

Generate skill with Ollama:
  upskill generate "parse YAML" --model llama3.2:latest \
    --base-url http://localhost:11434/v1 --no-eval

Evaluate skill with local model:
  upskill eval ./skills/my-skill --model qwen2.5-coder:7b \
    --base-url http://localhost:11434/v1 --tests tests.json

## Technical Details

The implementation uses FastAgent's generic provider support with environment variables:
- GENERIC_BASE_URL: Points to local API endpoint (e.g., http://localhost:11434/v1)
- GENERIC_API_KEY: Set to "local" (required but unused by Ollama)
- ANTHROPIC_API_KEY: Set to "dummy" to bypass startup checks

The monkey patch catches ModelConfigError for unknown models and falls back to generic provider when GENERIC_BASE_URL is configured.

## Limitations

Test case generation with --eval-model in generate command may not work due to FastAgent's structured() method not properly respecting model overrides. Workaround: use --no-eval and provide test cases manually with --tests flag.
@evalstate
Copy link
Collaborator

Thanks for the patch -- for these models generic.qwen2.5-coder:7b should work out of the box?

@FrenzyVJN
Copy link
Author

Good question — I tested this flow to confirm the behavior.

Currently, generic.qwen2.5-coder:7b does not work without additional configuration, because the generic provider still needs a base_url to know which API endpoint to call. Without it, FastAgent falls back to the default provider and errors (e.g., missing Anthropic key).

There is also a secondary issue: when using the generic provider, the full model string (generic.qwen2.5-coder:7b) is forwarded to Ollama, but Ollama expects just qwen2.5-coder:7b.

The --base-url approach resolves both problems by:

  • explicitly selecting the generic-compatible endpoint
  • avoiding unintended provider fallback
  • keeping the CLI usage predictable for local model setups

That said, we could improve this further by stripping the generic. prefix before sending the request. This would make the behavior more forgiving and align better with user expectations.

Happy to implement that if you think it's the right direction — otherwise I'm comfortable keeping the current explicit configuration to avoid hidden magic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants